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Abstract 

We present an algebraic method for constructing a highly effective 
coarse grid correction to accelerate domain decomposition. The coarse 
problem is constructed from the original matrix and a small set of input 
vectors that span a low-degree polynomial space, but no further knowledge 
of meshes or continuous functionals is used. We construct a coarse basis 
by partitioning the problem into subdomains and using the restriction of 
each input vector to each subdomain as its own basis function. This basis 
resembles a Discontinuous Galerkin basis on subdomain-sized elements. 
Constructing the coarse problem by Galerkin projection, we prove a high- 
order convergent error bound for the coarse solutions. Used in a two- 
level symmetric multiplicative overlapping Schwarz preconditioner, the 
resulting conjugate gradient solver shows optimal scaling. Convergence 
requires a constant number of iterations, independent of fine problem 
size, on a range of scalar and vector-valued second-order and fourth-order 
PDEs. 


1 Introduction 

Many discretizations of elliptic partial differential equations (PDEs) lead to 
sparse symmetric positive definite (SPD) linear systems of the form Au = f . For 
large 3D problems, iterative solvers such as preconditioned conjugate gradient 
(CG) are usually necessary. The condition number often grows quickly as h —> 0, 
where h is the mesh element size used for discretization: the quality of the 
preconditioner becomes the crucial factor for efficiency and robustness. 

With an optimal preconditioner, the linear system can be solved to desired 
precision in a time which scales linearly with the problem size. Two popular 
and related frameworks for potentially optimal preconditioning are multigrid 
(MG) [6] and domain decomposition (DD) algorithms [14, 17]; we focus on the 
latter in this paper. The key component we present is a coarse discretization of 
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the PDE, using larger elements of size H , providing the coarse grid correction 
to accelerate global convergence. 

A critical factor in selecting a solver is the question of how much domain 
knowledge the preconditioner requires. Geometric approaches require the prac¬ 
titioner to re-discretize the PDE at multiple scales, which for irregular domains 
and/or coefficients may be challenging. In contrast, algebraic approaches work 
almost entirely with the matrix A. While algebraic methods may be more dif¬ 
ficult to develop, they can provide benefits in both ease of use and in handling 
irregular problems. 

The method we propose here is essentially algebraic, but uses additional 
discrete information: we ask for a small set of generating vectors that span 
the space of degree p polynomials. We construct a coarse basis by algebraically 
partitioning the domain into subdomains and using the restriction of each gener¬ 
ating vector to each subdomain as its own basis function. The resulting coarse 
space functions are piecewise-smooth, with jumps at subdomain boundaries. 
From this basis, we construct a coarse problem by Galerkin projection. 

We derive an error bound on the solutions to the coarse problem, and show 
that it is a high-order accurate convergent coarse grid approximation for a va¬ 
riety of PDEs and discretizations. Convergence requires a limited coarsening 
factor [ H /h\ and sufficiently large p. Combined with DD in a Krylov method, 
we observe the number of required iterations decreases rapidly with p, and 
has reduced dependence on [ H /h\, e.g., maintaining optimal scaling in the case 
h = H 2 . 

For any finite resolution of the fine problem, our coarse bases may or may 
not be interpreted as discontinuous. However, in the limit as h —> 0 with H 
fixed, they are equivalent to the bases used in Discontinuous Galerkin (DG) 
methods [1]. We call our coarse basis functions discretely-discontinuous, giving 
rise to the name Discretely-Discontinuous Galerkin (DDG) for the approach. 

We provide both theoretical and numerical evidence that DDG provides a 
convenient tool for easily constructing highly effective coarse grid corrections 
for a wide range of problems, varying over the type of discretization (e.g., clas¬ 
sic finite elements or finite differences), the domain (from Cartesian grids to 
adaptive unstructured meshes), and the underlying PDE (e.g., vector-valued 
elasticity and fourth-order biharmonic problems). 


2 Related Work 

Our approach is closely related to the aggregation-based algebraic methods for 
constructing a coarse basis. For a more thorough review of aggregation tech¬ 
niques in the MG context, we refer the reader to review paper by Stiiben [15]. 
We review the most closely related ideas and the DD setting. 

The performance of non-smoothed aggregation, like ours, depends critically 
on [ H /h\. The simplest aggregation algorithm produces a piecewise constant 
coarse space. If [ H /h] = 0(1), then this preconditioner applied to the Poisson 
problem has condition number bounded independent of h [12, 13]. 
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For elasticity and higher-order PDEs, a piecewise constant basis is insuffi¬ 
cient. Better aggregation techniques have been derived by requiring additional 
user input: the vectors that span the (near-)nullspace of the PDE [19], e.g., the 
rigid modes for elasticity and the linear polynomials for biharmonic problems. 

These techniques are already optimal, in the sense that the number of iter¬ 
ations is bounded independent of problem size. Improvements to the iteration 
count can come in the form of constant factor reductions and reduced depen¬ 
dence on [ H /h] (or geometric dependencies such as the PDE, domain, coefficients, 
etc.). Many works present modifications to aggregation-based techniques that 
improve their performance in these ways. 

Despite the optimal scaling of non-smoothed aggregation, when [ H /h) is large, 
the aggregation-based coarse solution is a poor approximation to the actual 
solution. Galerkin projection finds a solution which is optimal in energy norm, 
but the near-discontinuities at subdomain boundaries dominate the energy. One 
way to reduce this dependence on [ H /h] is to keep the aggregation basis but apply 
a non-Galerkin projection, as in over-correction methods that apply a scaling to 
the Galerkin solution [3, 4]. In practice, this significantly improves the results. 

Alternatively, one can work to change the basis. Sala et al. [13] show that 
the subdomains used for aggregation can be smaller than those used for the DD 
smoothing step. Following this idea, the associated term in the bound on the 
condition number reduces geometrically with H. This requires some additional 
work to come up with the extra partitions, and enlarges the size of the coarse 
problem. 

Another alternative is smoothed aggregation, which smooths the basis func¬ 
tions, thus reducing the steep jumps at subdomain boundaries. For the Poisson 
problem, this transforms an H /h term in the condition number bound into an 
H /d term, where d is the smoothing diameter [12]. This keeps the size of the 
coarse problem the same as basic aggregation, but requires additional work to 
smooth the basis, and can increase the number of nonzeros in the coarse matrix. 

Our method reduces to non-smoothed aggregation if the only generating vec¬ 
tor is the constant vector, and our method inherits the upper bounds proven for 
non-smoothed aggregation. We increase the performance beyond non-smoothed 
aggregation by using a higher-order basis, which creates to a high-order accurate 
rediscretization of the input PDE. By using a p th order coarse basis we reduce 
the energy at subdomain boundaries from H /h to # p+1 /fc. We find the added 
power of higher-order bases greatly reduces the required number of iterations. 

Beyond non-smooth aggregation, discontinuous functions have appeared within 
DD algorithms before. For example, the restricted additive Schwarz method 
generates discontinuities at subdomain boundaries and has improved perfor¬ 
mance relative to the equivalent smooth method [7, 10]. When using Discon¬ 
tinuous Galerkin (DG) discretizations, discontinuities are already present in the 
fine problem. Previous works have developed DD solvers specifically for DG 
[8, 9], or agglomorated fine DG problems to construct coarse ones [2]. While 
our approach uses a basis like that of DG methods, it does not require that the 
fine problem be discretized with DG, but can be interpreted as a rediscretization 
of the problem using a DG basis on elements of size H. 
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3 Preliminaries 

The DDG algorithm requires no geometric interpretation or information for 
implementation (the generating vectors which typically would contain a basis 
for polynomials in the nodal coordinates are treated as a black box). However, 
our understanding and analysis is intimately tied to a geometric interpretation, 
so we frequently refer to the geometric properties for simplicity. For reference, 
table 1 lists the major symbols used throughout this paper. They are also each 
defined at first use. 


Table 1: Common symbols used throughout the paper. 


input, fine grid system 

Au = f 

input, generating basis (tall matrix) 

F 

restriction to coarse space 

Ro 

restriction to overlapping subdomain i > 0 

R 

coarse grid matrix 

Ao = RoARj 

coarse grid system 

AoUo = Rof 

element diameter in fine grid 

h 

element diameter in coarse grid 

H 

coarsening factor 

PA] 

polynomial degree used in coarse basis 

P 

subdomain overlap, in geometric distance 

S 

subdomain overlap, in algebraic graph distance 

A 

spatial dimension 

d 


4 The Coarse Grid 



Figure 1: For a simple Poisson problem with random right-hand-side, one-level 
DD produces a piecewise-smooth error after one iteration. From left to right: 
partition, error after DD smoothing, x-derivative of error 

An effective coarse grid needs to be able to approximate the error left by 
the one-level DD method. After a single pass of one-level DD, the error is 
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1 

%F(i,j) = input basis j evaluated at 

node i . 

2 

% Node i is in subdomain partition (i) 
index). 

(zero—based ^ 

3 

function RO = basis (F ,partition) 


4 

k = size (F ,2) ; 


5 

[i , j ,s] = find (F) ; 


6 

Rt = sparse(i, partition)i)*k+j,s 

); 

7 

[Rt,~] = qr (Rt ,0) ; % optional ortho 

gonalization . 

8 

RO = Rt ' ; 



Figure 2: MATLAB code to build the restriction Ro from input vectors F and 
a partition. 


extremely smooth in each subdomain, but not across subdomain boundaries 
(figure 1). From this structure, we are motivated to use piecewise higher-order 
polynomials which have a similar piecewise-smooth structure. 

Our coarse approximation space consists of piecewise polynomials which 
are smooth within each subdomain, but have arbitrary jumps at subdomain 
boundaries. The construction uses several user-supplied vectors, arranged in 
the columns of F, that span a degree p polynomial space. For many discretiza¬ 
tions, F can be easily built using the nodal coordinates of the mesh and the 
constant vector. 

To construct the coarse restriction and coarse system matrix, we follow the 
same Galerkin projection as used in previous aggregation methods (e.g. [19]); the 
difference is in our choice of F. The subdomains are built by partitioning the 
discrete domain into non-overlapping subdomains fb (i.e. subsets of indices) 
containing approximately ( H /h) d nodes for problems in dimension d. Unless 
otherwise indicated, all of our examples were partitioned using with a graph- 
based discrete algorithm from the SCOTCH library [11]. 

The coarse basis for subdomain i is spanned by the columns of cj>i = R^R^F, 
where R^ is the restriction to the i th partition domain (with no overlap). The 
final coarse restriction is made by concatenating these basis vectors together as 
rows in Ro. To help with conditioning, we orthogonalize Ro, which may be done 
independently for each subdomain as there is no overlap. Orthogonalization is 
optional, but the remainder of the paper assumes Ro is orthogonal to simplify 
the analysis. Figure 2 shows simple (albeit inefficient) MATLAB code for this 
construction. A robust implementation must also detect when cf>i is not full 
rank, and discard columns as necessary. 

The coarse matrix Ao is constructed by Galerkin projection, Ao = RoARj, 
and the coarse approximation to u (of Au = f) is given by Rjuo where AoUo = 
R 0 f. 

The size of A 0 increases with p, both in rank Q(p d ) and in the number of 
non-zeros 0(p 2d ). However, the block sparsity pattern of A 0 is independent of p. 
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It has the same sparsity pattern as the subdomain adjacency matrix, but with 
each non-zero replaced with a small dense block with size dependent on p. This 
structure allows for efficient numerical linear algebra using dense storage and 
operations. The optimal choice of p, in terms of total work to solve the problem, 
will depend on both the problem at hand and details of the implementation, 
but we generally found p = 3, cubic polynomials, is a good default. 

5 Coarse Grid Analysis 

We show that, under moderate assumptions, the error of the coarse solution is 
bounded by 

IIRqUo - U|| A < + W 1/2 )M^ i+ p (n) (1) 

for PDEs of degree 2 q, where c is independent of h and H, and u is the smooth 
interpretation of u defined in the next section. When [ H /h] = 0(1) and 1 +p > q , 
this error converges at high-order in H. A convergent coarse grid approximation 
naturally allows the coarse grid correction to capture all components of the error 
not handled by fine grid smoothing, leading to optimality. 

We present two arguments for convergence of the coarse grid. First, we 
present an argument for FEM discretizations leveraging the extensive theory 
surrounding FEM and Sobolev norms. Second, we give an alternative argument 
that depends only on some discrete algebraic properties, which must be shown 
for each particular discretization. 

5.1 Error Bound Using Geometric Properties 

Here we restrict our attention to the finite element method. 

Let the domain be partitioned into subdomains fU Let u be in the 
Sobolev space W£,(fi) (i.e. it should have bounded g th -order weak derivatives) 
and suppose u is C°° in each subdomain. Let Uh G be a FEM interpolant 
of this function on some mesh. We assume that both the subdomains and 
the mesh elements satisfy all the usual regularity assumptions for meshes with 
elements of diameter H and h respectively. We represent Uh with a discrete 
vector u, assuming a nodal basis, so = u{xi). Furthermore, we assume that 
the FEM interpolant satisfies ||«||w9 < cHu/Jlw®, which is true when ||u||w® = 
ll u /i||w| + 0(h) and h is smaller than some hg. 

Let the PDE be given as a symmetric elliptic bilinear form, a{u , u ) = 
j n k{u] 2 dVt : with some linear functional k[-] involving up to q th order deriva¬ 
tives. For example, k = V for the Poisson problem or k = V 2 for the bi¬ 
harmonic problem. We assume continuity a(u,u) < c||m||^, , where c denotes 
an arbitrary constant independent of h and H. Discretized with the FEM, 
u T Au = a(u h ,u h ). 

Each FEM nodal point lies within exactly one subdomain, and has an asso¬ 
ciated basis function. In some areas, basis functions from multiple subdomains 
overlap. Let the union of all mesh elements containing these overlapping areas, 
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Figure 3: A simple finite element mesh is divided into three subdomains. The el¬ 
ements separating the subdomain interiors from each other and from the bound¬ 
ary conditions are Cl cu t. In Cl su b, the error is small because the approximation 
is good. The error is larger in Cl cu t, but the total area is small enough that it 
does not hinder convergence. 

plus any elements touching the boundary of Cl, be Cl cut , and let Cl su b = Cl \ Cl cut 
(figure 3). For the purposes of the proof, we introduce additional bilinear forms 
a cu t(u,u ) = f Q k[u] 2 dCl and a su b defined analogously, along with their dis¬ 
cretizations A cut and A su b- We note that a = a cu t+a su b and A = A cu t+A su b. 

If the mesh and subdomains are sufficiently regular and the FEM basis 
functions have the usual compact support, then the (d-dimensional) volume 
ni^cut) < c[ h /h\. It is linearly dependent on h because Cl cut is in a band of 
thickness ch around the subdomain boundaries, and inversely dependent on H 
because that is the rate at which the total subdomain surface area grows. 

As in any Galerkin scheme, the coarse solution Rjuo is the minimum error 
solution in the energy norm over the entire coarse space. Therefore the error is 
bounded by that of any particular coarse vector, including v = RjRoU. Using 
this and the splitting, 


l|Ro u o -u|| a < ||v — u|| A 


( 2 ) 

(3) 

(4) 



< |v - u| Asub + |v - u| Acut 


-cut 


Before tackling either of these terms, consider what v = RjRoU is. Because 
Ro is orthogonal, RjRoU is the I 2 projection of u onto the coarse space. By 
construction of the coarse space, v has a continuous interpretation v that is a 
degree p polynomial in each subdomain flj. We can find v directly from u by a 
per-subdomain least-squares approximation of u by a degree p polynomial, min¬ 
imizing the sum of the squared error at each of the FEM nodal points. Barring 
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pathological distributions of mesh nodes, v will be a high-order approximation 
to u, satisfying the same error bounds commonly derived for FEM interpolants 
on a mesh with elements of size H. 

Now, we can bound the discrete error in terms of the geometric functions. 
We cite the appropriate theorems from Brenner et al. [5] for Sobolev and FEM- 
interpolant inequalities. Looking first in f l su b, we find that the coarse polyno¬ 
mials are a high-order approximation: 


- u |A sub 

error interior to subdomains 

(5) 

\ v h ~ U h\a su b 

discrete and FEM energy are equal 

(6) 

c\\vh - Uh\\w%(n sub ) 

continuity assumption 

(7) 

c\\v - u w 2 5 (n sub ) 

convergent FEM for sufficiently small h 

(8) 

c \\ v - U \\w£(n sub ) 

broken semi-norm defined below 

(9) 

c \\ v ~ u Ww%(n) 

increasing domain only increases the norm 

(10) 

C JJ +P q \ U \~l 

Theorem 4.4.20 [5] 

(11) 

C H +p q \u\^i+p^ n s ) 

2-norm vs. oo-norm 

(12) 


Here the broken semi-norm W q (Q) on domain Q is defined as a sum over sub- 
domains: 



(13) 


This is naturally extended to a maximum over subdomains for a = oo. 

Turning to f2 cut , the coarse polynomials are not a high-order approximation 
in the energy norm, because u is not smooth and v is not even continuous in 
this region. However, f l cut is small enough that L°° bounds are sufficient. 


I v - U k cut 

= | V h - U h \a cu t 

< c\\v h - Uh\\w°(n cut ) 

< ch~ q \\v h - U h ||L2(fi out ) 

< ch q fr(£l C ut) ^ \\ v h — Uh\\L a °(n cut ) 

< ch- q [ h / H ] 1/2 \\v h - u h \\ Lao(Ucnt) 

< ch~ q [ h /H] l/2 \\v - u||i-(n c „ t ) 


error at subdomain boundaries 

(14) 

discrete and FEM energy are equal 

(15) 

continuity assumption 

(16) 

Theorem 4.5.12 [5] 

(17) 

2-norm vs. oo-norm 

(18) 

mesh and subdomain regularity 

(19) 

stability of interpolation, 4.4.1 [5] 

( 20 ) 
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Theorem 4.4.20 [5] 

( 21 ) 

< ch~ q [ h /H ] 1 ' 2 H 1+p \u\^i+ P ^ increasing domain 

( 22 ) 

= cH 1+p - q [ H /h\ q ~ 1 / 2 \u\^,i + p( n s ) factor to match eq. (12) 

(23) 

Combining the two bounds 12 and 23 completes the error bound 

IIRqUo - U|| A < cH^~ q ( 1 + [ H / h r 1/2 )\n\^ +P{n) (24) 

5.2 Error Bound Using Algebraic Properties 

We can derive a similar bound based purely on algebraic components. As be¬ 
fore, let v = RjRou and A = A cu t + A su b be a splitting into symmetric 
positive semi-definite components. This splitting need not correspond to the 
FEM definition given earlier. However, we require that A cut = [ B “ ut °] with 
B C iii G K mxm and m < c[ h /ii\h d . This is usually true and plays the role of 
Ai(tlcui) from the geometric proof. 

We assume that for all u £ V, 


II A cut || 2 < ch 2q , 

(25) 

IMIoo < ch d/2 ||u|| 2 , 

(26) 

u — V A sub < cH 1+P ~ q \\u\\ 2 , 

(27) 

and 1 1 u - v||oo < cH 1+p \\u\\ x . 

(28) 


The constants c include the roughness term M^i+p^ from the geometric proof. 
Geometrically speaking, the subspace V must be restricted to functions with 
bounded roughness. 

Showing that these assumptions are true for a particular discretization could 
exploit geometric properties as in the previous section. 

From these assumptions, the convergence argument follows the exact same 
structure as in the geometric case and we do not repeat it. The final error bound 
is similar to the above: 

||R?uo - u|| A < C H 1+ p-*( 1 + W 1/2 )||u|| 2 . (29) 

5.3 Proof vs. Practice 

The proof and our use of the coarse grid in practice are not entirely consistent 
with each other. The coarse grid is used to approximate the error after applying 
one-level DD. For PDEs with smooth coefficients, as h —0 with H fixed, the 
error after one-level DD is piecewise C°° as in the proof. For any finite h, it is 
only an approximation as accurate as the discretization. 
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In problems with discontinuous coefficients, even as h —> 0, the error after 
one-level DD is not C°° in each subdomain. It has kinks where the coefficients 
have discontinuities. We tried matching the partition boundaries to the disconti¬ 
nuities, or using a piecewise generating basis F that matches the discontinuities. 
We observed optimal scaling even without these strategies, but either strategy 
significantly accelerated convergence with high-order polynomials. 

When we use non-trivial overlap between subdomains, the boundaries of the 
smooth regions do not line up with the discontinuities in the coarse space. This 
is easy to resolve by adjusting the coarse subdomains, but this introduces more 
variation in the coarse subdomains’ size and shape. In numerical experiments, 
better results were obtained by ignoring this inconsistency with the proof and 
keeping the original subdomain shapes. 

The roughness term in the error bound can increase with p , suggesting that 
the error can actually increase with p. However, because increasing p always 
grows the coarse vector space, and the Galerkin solve is optimal in that space, 
increasing p never increases the error. 

6 Domain Decomposition 

We combine the coarse grid within a standard multiplicative algebraic DD 
framework. 

The DD subdomains begin with the same partition computed for the con¬ 
struction of the coarse basis. From the partition, overlapping subdomains are 
algebraically constructed by expanding the partition to include nodes within 
graph distance A in the graph defined by A. The expanded subdomains flj 
overlap in geometric bands of size <5. For simple meshes and discretizations, 
6 = (1 + 2A )h. Unless otherwise indicated, we use minimal overlap A = 0. 

Let Ri be the restriction matrix, such that U; = R;U is the vector of the 
elements of u from subdomain I\, i.e. R* is a subset of the rows of the identity 
matrix. For each subdomain, the local problem uses the matrix A i = R.; AR^, 
and we solve these subdomain problems exactly. Given a current approximation 
Ufc , processing subdomain i updates the approximation to 

u fc+ i = u fc + R^A^R^f - Au fc ). (30) 

Iterating over all subdomains and updating the approximation to u after each, 
we arrive at the algorithm for one-level multiplicative overlapping Schwarz. 

To build a two-level method, we multiplicatively combine one pass of one- 
level Schwarz as a pre-smoother, the coarse problem solution, and another pass 
through the subdomains as a post-smoother. The post-smoother is done in 
reverse order, making the entire operation symmetric and usable with CG. 

We give some experimental results with a three-level method operating in a 
V-cycle. We construct the three-level problem by taking the two-level algorithm 
and applying it again to the coarse matrix Ao to make an even coarser matrix 
Ai. To do this, we need a coarsened version of the generating vectors, which 
are simply F 0 = RqF. The coarsened coarse problem is equivalent to directly 
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coarsening the original problem with larger subdomains. We keep the ratio 
between physical element sizes in adjacent levels the same (i.e. h/Ho = Hq/Hi). 
The algebraic overlap A used for smoothing is also the same at all levels. 

6.1 Condition Number 

For several special cases, our approach reduces to previously published aggre¬ 
gation approaches. For the Poisson problem using p = 0, Sala [12] showed 
that the additive variant of the preconditioner has condition number bounded 
by 0(1 + [ H /h]). For elasticity and biharmonic problems with p = 1, our ap¬ 
proach is essentially a non-smoothed two-level variant of the multigrid method 
described by Vanek et al. [19]. Their coarse grid uses the zero-energy modes, 
which are p = 1 for biharmonic and a subset of p = 1 for elasticity. They later 
prove optimal convergence, but only for the Poisson problem [18]. We do not 
have a condition number bound showing the dependence on p and q. However, 
increasing p beyond the low-order choices in the literature increases the dimen¬ 
sion of the coarse space, which does not have a negative effect on convergence 
it can only increase the rate of convergence. 

7 Numerical Experiments 



Figure 4: Left: sample annulus mesh used for Poisson problems B,C,D, with 
shading showing S + 10J. Right: sample adaptive mesh used for elasticity 
problem E. 

We demonstrate the performance of our coarse grid and DD as a precon¬ 
ditioner for CG on a variety of PDEs and discretizations. Unless otherwise 
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noted, all problems are solved to a 10~ 9 reduction in residual after the first 
application of the preconditioner. The right-hand-side vector f is a random 
Gaussian-distributed vector, and the initial guess for u is 0. For the sake of eas¬ 
ier reporting, we consider all problems with uniform meshes to be scaled such 
that rank(A) = n = h~ d , where d is the spatial dimension. Consequently, H~ d 
is the number of subdomains. 

The graph of the logarithm of the residual vs. the iteration count is typically 
very straight. Therefore we measure not just the integer iteration on which 
the residual is first smaller than the tolerance, but also the fractional iteration 
count at which the linear interpolation of this graph meets the tolerance. We 
found this reveals a lot of otherwise hidden detail, and this is shown in some 
figures. When CG takes more than 1000 iterations, we stop the solve and report 
iteration bounds based on condition number estimates derived from the Lanczos 
coefficients computed during CG. For converged problems, these bounds agreed 
very well with actual iteration counts. 

Scaling with h and H are already well explored in the existing aggregation- 
based literature. Our approach does not perform significantly differently along 
these axes, so we concentrate on the dependence on p and the novel scaling 
regimes that our approach can handle. 

We use the following problems: 

(A) Poisson 3D. V • Vu = / discretized on an m x m x m regular grid with 
a 7-point finite difference stencil. One face of the cube has a Dirichlet 
boundary condition and the remainder are Neumann. For this problem, 
we partition using recursive inertial partitioning [16] so that the matrix 
need never be explicitly constructed. The first partition uses a randomly- 
oriented plane to ensure irregularly shaped subdomains. 

(B) Smooth Poisson. V • SVu = / discretized with piecewise linear finite 
elements on a 2D unstructured triangle mesh of a circular annulus with 
outer radius 5 times the inner radius. Both inner and outer boundaries 
use Dirichlet conditions. The scalar function S = exp(l + sin(7r(:r + y))) 
is smooth. See figure 4. 

(C) Non-Smooth Poisson. V • DX7u = / discretized as above, but with dis¬ 
continuous D = S + 100 J where J = H (0.25 + cos(7ra:) cos(27r?/)) with 
Heaviside step function H. J is an indicator function for two ‘materi¬ 
als’ in the problem. We used algebraic partitions that do not conform to 
the material boundaries, but we use generating vectors F that are piece- 
wise polynomial with respect to the material domains. This doubles the 
number of columns in F, but only subdomains that include the material 
boundary end up with additional coarse basis functions, so Aq is only 
marginally larger. In practice, we observe optimal scaling even without 
this extra work and it makes no difference with the piecewise constant ba¬ 
sis. However, with the piecewise cubic basis, this material-aware F reduces 
the iteration count by nearly one half. 
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(D) High-Order Poisson. V • V discretized as in B and C, but with continuous 
piecewise cubic finite elements. 

(E) Elasticity. (V ■ V + VV-)m = / with Dirichlet boundary conditions, with 
vector u. This is discretized on a spatially-adaptive unstructured 2D trian¬ 
gle mesh (figure 4) with piecewise linear FEM. For the generating vectors 
F, we take the degree p polynomials in each component of u. 

(F) Biharmonic. V 4 it = / on a regular m x m 2D grid discretized with a 13 
point finite difference stencil. All boundaries have homogenous Dirichlet 
and Neumann conditions. This problem uses an algebraic overlap A = 1, 
since performance is quite poor with A = 0. Also this problem is solved 
only to 10~ 6 reduction in residual, as the fine discretizations are very 
poorly conditioned causing CG to break down before reaching 10~ 9 as 
used in the above. 

Table 2 summarizes the results for solving these problems at different h, H, 
and polynomial coarse spaces from piecewise constant Pq to cubic P 3 . With fixed 
[ H /h], we observe near constant iteration counts, independent of the problem 
size, when using the two-level algorithm. Part of the increase in iteration count 
can be attributed to degrading partition quality with the algebraic partitioner. 
Experiments (not shown) with more structured partitioning of the structured 
meshes showed less variation in iteration counts. The three-level V-cycle does 
not perform nearly as well, but still appears to be sub-logarithmic in n. 

Table 3 shows the wall-clock time spent on setup (excluding partitioning) 
and solution of some problems from table 2. The 2D problems were solved 
with a MATLAB implementation that solved each subdomain problem with the 
“backslash” operator on each iteration, but stored a factorization of the coarse 
grid. The 3D problems were solved with a parallel C-l—h implementation that 
solved the subdomains with successive over-relaxation, and solved the coarse 
grid with conjugate gradient, preconditioned with incomplete Cholesky. Both 
implementations ran on a 32-core Intel Xeon E5-2690 with 256GB of RAM. In 
all cases, the bulk of the runtime is spent on the subdomain solves, despite the 
use of poorly-scaling solvers for the coarse problem. Furthermore, the runtime 
scales approximately linearly with problem size, as desired. 

To directly explore the value in using higher-order coarse bases, we solve 
a 1000 x 1000 biharmonic problem with varying p, a large coarsening factor 
[■ H /h ] = 125, and A = 2. The large coarsening factor is desirable for efficient 
parallel implementations, but significantly reduces the accuracy of the low-order 
coarse grids. As shown in figure 5, we observe the number of iterations decreases 
rapidly with increasing p. 

Note that increasing p increases the size of Ao, so there are diminishing 
returns with large p. Nonetheless, significant reductions in problem size are 
achieved for all p. The least reduction, with p = 10, is rank(Ao) = (1/284) • 
rank(A). A similar effect occurs in aggregation techniques when the aggregation 
subdomains are smaller than the subdomains used in smoothing. We compare 
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Table 2: Iterations of CG to solve various problems in n variables. Pq, i.e. non- 
smoothed aggregation, is not expected to achieve optimal scaling for V 4 ; the Pq 
column is included for comparison only. Problems marked “ x ” are too small to 
use the three-level solver - the coarsest partition would be a single subdomain. 







Two-Level 




Three-Level 


n i/d 


H/h= 

10 



H/h= 

20 


H/h =10 



Po 

Pi 

Pi 

P 3 

Po 

Pi 

Pi 

P 3 

Pi 

Pi 

P 3 


40 

36 

20 

15 

12 

35 

23 

18 

15 

X 

X 

X 

Q 

CO 

80 

41 

20 

16 

13 

51 

28 

21 

18 

X 

X 

X 

> 

160 

44 

21 

16 

13 

61 

30 

23 

19 

39 

29 

23 

> 

320 

44 

21 

16 

14 

63 

30 

23 

19 

43 

32 

26 


640 

46 

22 

16 

14 

65 

31 

23 

19 

48 

35 

34 


200 

41 

19 

14 

11 

51 

25 

20 

15 

26 

23 

19 

> 

400 

44 

19 

14 

11 

60 

26 

19 

16 

40 

29 

24 

to 

800 

48 

20 

15 

12 

64 

27 

20 

17 

43 

31 

23 

> 

1600 

52 

20 

15 

12 

68 

28 

22 

17 

45 

32 

25 


3200 

52 

21 

16 

13 

71 

29 

22 

17 

45 

32 

25 


200 

43 

19 

15 

12 

56 

27 

22 

16 

32 

27 

25 

> 

400 

46 

20 

15 

13 

64 

28 

22 

19 

42 

28 

24 


800 

48 

21 

16 

13 

67 

31 

23 

18 

43 

31 

25 

i> 

1600 

52 

21 

17 

14 

69 

33 

23 

20 

47 

34 

26 


3200 

58 

24 

17 

13 

69 

31 

24 

19 

49 

35 

27 

s 

200 

47 

22 

17 

14 

58 

29 

21 

17 

33 

24 

21 

W 

[jh 

400 

53 

22 

17 

14 

68 

29 

22 

19 

44 

34 

25 

cC 

800 

53 

24 

17 

14 

73 

31 

23 

19 

47 

33 

26 

<N 

1600 

56 

23 

19 

15 

75 

32 

25 

19 

49 

35 

28 

> 

3200 

67 

24 

18 

15 

83 

32 

24 

20 

50 

36 

28 


200 

72 

29 

21 

19 

83 

40 

32 

25 

X 

X 

X 

4 - 3 ' 

V 

400 

81 

29 

24 

18 

96 

41 

30 

25 

47 

37 

30 

4-3 

CO 

800 

82 

33 

23 

20 

104 

42 

31 

26 

67 

48 

39 

3 

1600 

76 

32 

24 

20 

107 

44 

33 

27 

69 

51 

36 

3200 

76 

33 

27 

23 

108 

47 

36 

29 

71 

50 

40 


200 

698 

62 

20 

12 

600 

154 

44 

24 

119 

40 

22 


400 

2900 

68 

21 

12 

3500 

188 

53 

27 

376 

112 

37 

I> 

800 

5700 

77 

25 

15 

6900 

184 

55 

32 

961 

156 

51 


1600 

9500 

99 

31 

15 

11000 

246 

70 

30 

2100 

169 

58 


to this approach by using p = 1 but aggregating on smaller subdomains. The 
high-p basis significantly outperforms this approach (figure 5). 

Our error bound does not strictly require that [ H /h\ = 0(1) in order to 
produce a convergent coarse grid, and accompanying optimal preconditioner. 
For Poisson (<7 = 1) using a piecewise linear coarse basis (p = 1) and substituting 
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Table 3: Execution times for the middle column of table 2 (two-level scheme 
with H/h= 20). Each entry shows the wall-clock time in seconds and the percent 
of total runtime spent on coarse grid setup and solution. Problems marked “x” 
did not converge within 1000 iterations. 


nd' d 

Po 

Pi 

P2 

P 3 


40 

79 ( 0%) 

55 ( 0%) 

42 ( 0%) 

32 ( 1%) 

(—1 

CO 

80 

152 ( 1%) 

87 ( 1%) 

64 ( 2%) 

53 ( 3%) 

> 

160 

481 ( 2%) 

238 ( 4%) 

186 ( 6%) 

160 (11%) 

> 

320 

1207 ( 3%) 

614 ( 8%) 

525 (15%) 

497 (27%) 


640 

7053 ( 5%) 

3785 (16%) 

3140 (25%) 

3563 (43%) 


200 

12 ( 2%) 

5 ( 2%) 

4 ( 3%) 

3 ( 7%) 

> 

400 

48 ( 1%) 

22 ( 2%) 

17 ( 4%) 

15 ( 8%) 


800 

246 ( 1%) 

108 ( 2%) 

87 ( 4%) 

72 ( 9%) 

p> 

1600 

1283 ( 1%) 

572 ( 2%) 

423 ( 4%) 

358 ( 9%) 


3200 

6491 ( 0%) 

2767 ( 2%) 

2107 ( 4%) 

1711 ( 9%) 


200 

11 ( 1%) 

5 ( 2%) 

4 ( 5%) 

4 (11%) 

> 

400 

51 ( 1%) 

22 ( 2%) 

18 ( 5%) 

16 (11%) 


800 

227 ( 1%) 

100 ( 2%) 

84 ( 6%) 

67 (13%) 

> 

1600 

1051 ( 1%) 

496 ( 2%) 

362 ( 6%) 

320 (12%) 


3200 

7170 ( 0%) 

3006 ( 2%) 

2254 ( 4%) 

1896 (10%) 

s 

200 

18 ( 1%) 

9 ( 1%) 

6 ( 4%) 

6 ( 5%) 

w 

400 

92 ( 0%) 

43 ( 1%) 

32 ( 3%) 

27 ( 6%) 

-CO 

Cl, 

800 

440 ( 0%) 

190 ( 1%) 

149 ( 3%) 

121 ( 7%) 

(N 

1600 

1902 ( 0%) 

853 ( 1%) 

635 ( 4%) 

498 ( 8%) 

l> 

3200 

10223 ( 0%) 

4123 ( 1%) 

3183 ( 3%) 

2706 ( 7%) 


200 

30 ( 0%) 

15 ( 1%) 

11 ( 3%) 

9 ( 6%) 

’o 

400 

147 ( 0%) 

60 ( 1%) 

46 ( 3%) 

42 ( 7%) 

-+J 

m 

800 

644 ( 0%) 

263 ( 1%) 

207 ( 3%) 

172 ( 7%) 

3 

1600 

2778 ( 0%) 

1160 ( 1%) 

883 ( 4%) 

727 ( 8%) 


3200 

14958 ( 0%) 

6202 ( 1%) 

5017 ( 3%) 

4625 ( 6%) 


200 

198 ( 0%) 

57 ( 1%) 

17 ( 1%) 

11 ( 3%) 


400 

X 

321 ( 0%) 

86 ( 1%) 

45 ( 4%) 

i> 

800 

X 

1565 ( 0%) 

484 ( 2%) 

225 ( 5%) 


1600 

X 

7900 ( 1%) 

2171 ( 2%) 

1012 ( 5%) 


the relationship h = H 2 , we arrive at the convergent bound 

I|Rq u o - u|| A < cff V2 M^ (n) . (31) 

Figure 6 shows the Poisson problem B with these parameters. Each subdomain 
has a number of nodes equal to the number of subdomains, so the coarse matrix 
and the subdomain matrices are a similar size, which is an interesting point in 
the design space. As in all overlapping DD methods, the smoother is very sensi- 
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Figure 5: The number of required iterations decreases rapidly with polynomial 
degree used in the coarse basis (parameters: h -1 = 1000, [ H /h] = 125, A = 2). 
For comparison, when using p = 1 but aggregating using smaller subdomains, 
the iteration count is much higher for the same number of coarse variables. 


tive to H/S. To keep it approximately constant, we set A = \_ H /4h\. The minor 
saw-tooth pattern in the graph comes directly from the remaining variation in 
H/S. With higher polynomial degrees, h = H s for higher powers of s would be 
possible. 

8 Discussion 

There are a number of outstanding questions raised by this work. We’ve shown a 
bound on the error of the coarse grid, dependent on p and the smoothness of the 
solution. Ideally, we would have a thorough understanding of the relationship 
between all of the parameters ( H , h, p, S , and q ) and the condition number or 
number of iterations to converge. We leave closing this gap in the analysis for 
future work. 

On the more practical side, the generating vectors F are not difficult to sup¬ 
ply, but it would be more convenient to construct similar high-order coarse grids 
directly from A. Also, our approach still has the same undesirable dependency 
on [ H /h] that is present in many algebraic approach, but is not present in ge¬ 
ometric methods. Following the connection between the coarse basis and DG, 
we have done some preliminary work on algebraically constructing a DG-like 
discretization that is independent of [ H /h ], but with mixed success. 
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Poisson convergence with h=H 2 
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Figure 6: Our grid is convergent and has approximately /i-independent conver¬ 
gence, even when [ H /h] is not fixed. Parameters: h = H 2 , 26 « H, p = 1. 

9 Conclusion 

We have presented an algebraic coarse grid construction that produces a con¬ 
vergent rediscretization of the PDE for a wide variety of PDEs. It works for 
both scalar- and vector-valued problems, both second- and fourth-order PDEs, 
and both smooth and discontinuous coefficients. The high-order DG-like coarse 
basis is easy to construct algebraically, and Galerkin projection generates a 
high-order convergent coarse rediscretization of the input problem. Combined 
with DD and CG, we observe convergence in a number of iterations nearly inde¬ 
pendent of problem size. Furthermore, increasing the polynomial degree rapidly 
reduces the number of required iterations: the fastest solves used high-degree 
polynomials. 
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